METHOD AND CIRCUIT ARRANGEMENT FOR SYNCHRONIZATION OF 
SYNCHRONOUSLY OR ASYNCHRONOUSLY CLOCKED PROCESSOR UNITS 

CLAIM FOR PRIORITY 
5 This application claims the benefit of priority to European 
Applications EP 02020602.5, filed September 12, 2002, and EP 
02027847.9, filed December 12, 2002, which were filed in the 
German language, and to U.S. Provisional Application No. 
60/432,671, filed December 12, 2002, which was also filed in 
10 the German language. 

TECHNICAL FIELD OF THE INVENTION 
The present invention relates to a system and method for syn- 
chronization of synchronously or asynchronously clocked proc- 
15 essor units. 

BACKGROUND OF THE INVENTION 
In telecommunications systems, in Data Centers and other 
high-availability systems, up to several hundred processor 
20 boards are used to provide the necessary processing power. 
This type of processor board typically consists of a proces- 
sor or CPU (Central Processing Unit), a chip set, random ac- 
cess memory and peripheral components. 

25 The likelihood that a hardware defect will occur in a typical 
processor board within any given year is a single-digit per- 
centage figure. Because of the large number of processor 
boards combined to form a system, there is a very great like- 
lihood of one of the hardware components failing within the 

30 one-year period, in which case a single failure of this na- 
ture can, unless adequate precautions are taken, lead to a 
failure of the entire system. 

Telecommunications systems, and increasingly Data Centers, 
35 are now subject to demands for high availability. This is for 
example expressed as a percentage or the maximum permitted 
downtime per annum is specified. Typical requirements are for 



example an availability rate of >99.999% or a non- 
availability rate of a few periods of 10 minutes per year at 
most. Since changing a processor board and restoration of the 
service in the case of a hardware defect normally takes some 
5 time, ranging from 10 minutes or so up to several hours, the 
corresponding precautions must be taken for the case of a 
hardware defect at system level to enable to demand for sys- 
tem availability to be fulfilled. 

10 Known solutions for maintaining these types of high system 

availability requirements make provision for redundant system 
components. The known methods can be subdivided into two main 
groups: software-based methods and hardware -based methods. 

15 Software-based methods typically use middleware. Software- 
based solutions, however, are typically not very flexible 
since only the (application) software that was developed for 
this particular redxindancy scheme can be used in such a sys- 
tem. This significantly reduces the range of (application) 

20 software that can be used. In addition, the application soft- 
ware for software redundancy principles is extremely e3q)en- 
sive to develop in practice, whereby the development also 
brings with it a complicated test procedure. 

25 The basic principle of hardware-based methods rests on encap- 
sulating redundancy at a hardware level such that it is 
transparent for the software. A major advantage of redundancy 
which is managed by the hardware is that the application 
software is not affected by the redundancy principle and 

30 thereby in most cases any given software can be used. 

One principle that is encoiintered frequently in practice for 
hardware fault tolerant systems for which redundancy is 
transparent for the software is what is known as the lockstep 
35 principle. Lockstep means that identically constructed hard- 
ware, e.g. two boards, are operated simultaneously with syn- 
chronized timing. Hardware mechanisms are used to ensure that 
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the redundant hardware experiences identical input stimuli at 
a given point in time and must therefore arrive at identical 
results. The results of the redundant components are com- 
pared, and a fault is determined if there is a discrepancy 
5 and suitable measures are initiated (sending of alarms to op- 
erators, partial or complete safety shutdown, system re- 
start) . 

The underlying requirement for the implementation of a lock- 
10 step system is the clocked deterministic behavior of all the 
components contained in the board, i.e. CPUs, chip sets, main 
memory etc. In this case, clocked deterministic behavior 
means that, if they are not faulty, the components will re- 
turn identical results at identical points in time when the 
15 components receive identical stimuli at identical timing 

points. Clocked deterministic behavior also requires the use 
of synchronous timing interfaces. Asynchronous interfaces 
cause a degree of imprecise timing in many cases, in which 
case the clock synchronous overall behavior of the system 
20 cannot be maintained. 

However, it is precisely for chip sets and CPUs that asyn- 
chronous interfaces offer technological benefits in increas- 
ing performance, which makes a clock-synchronous operating 

25 mode in accordance with the lockstep procedure impossible. In 
addition, modern CPUs increasingly use mechanisms that make a 
clock-synchronous operating mode impossible. These are for 
example internal correction measures not visible to the out- 
side world, e.g. correction of an internal correctable error 

30 for access to a cache memory which can lead to a slight delay 
in instruction processing, or the speculative execution of 
commands. A further example is the increasing implementation 
in the future of CPU-internal clock-free execution units 
which provide significant benefits as regards speed and power 

35 dissipation but prevent clock-synchronous or deterministic 
operation of the CPU. 



A functional lockstep arrangement for redundant processors is 
disclosed in U.S. Patent No. 5,226,152 in which the proces- 
sors are connected to a logic which synchronizes the access 
by the processors to the shared periphery and makes possible 
5 a functional lockstep operation of the redundant processors. 
In this case, the logic uses the wait signal of the proces- 
sors. 

With regard to the processor boards mentioned above, this ar- 
10 rangement, which features a single central logic , has one 
significant disadvantage, which is that in addition to the 
processor boards a logic board which then controls the syn- 
chronization of the peripheral accesses has to be provided 
for a specific number of processor boards. These logic boards 
15 must then be monitored in their turn, which would lead to 
complex monitoring mechanisms . 

In other words, while the arrangement in accordance with U.S. 
Patent No. 5,226,152 seems suitable for providing a func- 
20 tional lockstep for single-board systems with a nxamber of 

processors, this arrangement is not suitable for systems with 
a number of similar or identical processor boards of the type 
mentioned at the start. 

25 SUMMARY OF THE INVENTION 

The present invention specifies a procedure which preseirves 
the advantages of the lockstep method and which takes account 
of technological development. 

30 In one embodiment of the invention, there is a method pro- 
vided for synchronization of identical or different, redun- 
dant processing units PROo, PROi which process identical in- 
struction sequences and are synchronously or asynchronously 
clocked, in accordance with which transactions that are ex- 

35 temal to the processing units PROo, PROi are used by modules 
EQo, EQi assigned to processing \inits PROo/ PROi for synchro- 
nization of the processing xinits PROo, PROi with the process- 



ing units being delayed by the assigned modules and thereby 
synchronized until the instruction execution of all process- 
ing units has reached the current transaction. 

5 In this case, the following transactions can be used for syn- 
chronization: 

- non-cacheable memory transactions relating to a local 
memory MEMo, MEMi assigned to the relevant processing 
units PROo, PROi and/or 

10 - input/output transactions for input/output modules I/Oo/ 
I/Oi and/ or 

- memory-mapped input /output transactions for external reg- 

isters REGo, REGi and/ or 

- non-cacheable memory transactions relating to a common 
15 memory CMEM of processing units PROo/ PROi. 

Read transactions can be executed in this case by the module 
assigned to a processing unit leaving the processing unit in 
the wait state until the data to be read arrives and sending 

20 the parameter or parameters of the read transaction to the 
module linked most directly to the transaction destination 
I/Oo,I/Oi, MEMo, MEMi, REGo, REGi, CMEM, whereby the module 
linked most directly to the transaction destination receives 
and compares the parameter or parameters of the other modules 

25 as well as the locally created parameters and, if they match, 
executes the read transaction and distributes the read data 
to the modules, whereafter the modules forward the read data 
to the assigned processing \inits and enable instruction proc- 
essing to be continued. 

30 

Write transactions can be executed by the module assigned to 
a processing unit leaving the processing unit in the wait 
state until the write process is concluded and sending the 
parameter or parameters of the write transaction to the mod- 
35 ule linked most directly with the transaction destination 
I/Oo,I/Oi, MEMo/ MEMi, REGq, REGi, CMEM, whereby the module 
most directly connected to the transaction destination re- 
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ceives and compares the parameter or parameters of the other 
modules as well as the parameter or parameters created lo- 
cally and executes the write transaction if they match and 
acknowledges the completion of the write process to all mod- 
5 ules, whereafter the modules enable the instruction execution 
of the assigned processing units to be continued. 

Advantageously, external events, e.g. interrupts, can be han- 
dled in conjunction with the transaction-based synchroniza- 

10 tion method in accordance with the invention if the handling 
of the external events is initiated through the reading of a 
value, e.g. an interrupt vector, from a memory location or a 
register and it is also ensured that the processing units are 
presented with the external events at the same point in in- 

15 struction execution. The read transaction initiating event 

handling is executed as described above, e.g. by means of an 
Interrupt Acknowledge Cycle. 

A suitable method of synchronization of external events is 
20 described in European Patent Application 02020602 and makes 
provision for the external events to be buffered, whereby the 
stored external events are called in a special operating mode 
of the processing units for processing by at least one execu- 
tion unit of the processing units and whereby the processing 
25 units in this operating mode come in response to the fulfill- 
ment of a condition that can be pre-specif ied by commands or 
is fixed and the continuation of instruction execution by the 
modules EQo, EQi is delayed until the processing units have 
ended the special operating mode. 

30 

The changeover to the special operating mode is, for example, 
made if a match is determined by comparator elements K of the 
processing units between counter elements CIC and register 
elements MIR, whereby the content of register elements MIR 
35 can be pre-specif ied by commands and is identical for the 
processing units PROo, PROi and the co\int element CIC in- 
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eludes the niomber of instructions executed by the execution 
units since the last change into the special operating mode. 

Error handling can be initiated if the module linked most di- 
5 rectly with the transaction destination establishes a devia- 
tion of the parameters of the other modules as well as of the 
locally created parameter ( s ) . In this case, error handling 
can stop the transaction to be executed and start a routine 
for diagnosis, fault isolation and if necessary restoring 
10 synchronicity . If N (e.g. N=3) processing units are avail- 
able, a one (N-1) of N majority decision of generally an (N- 
M) of N majority decision can be made and the deviating proc- 
essing unit(s) can be deactivated. 

15 Failure detection can also be undertaken for individual proc- 
essing units, in which, for any given transaction, beginning 
with the earliest availability of the parameter (s) at the 
module of a processing unit, parameters not arriving or only 
arriving after a pre-specif ied time are rejected, whereby er- 

20 ror handling is initiated for processing units with parame- 
ters not arriving or only arriving after expiry of a pre- 
specif ied period. 

In another embodiment of the invention, there is an arrange- 
25 ment to synchronize synchronously or asynchronously clocked 
processing units PROo, PROi of redundant data processing sys- 
tems, with the following features: 

- at least two processing units PROo, PROi for processing 

identical instruction sequences, 
30 - peripherals MEMo, MEMi assigned exclusively to the proc- 
essing units for saving and/or exchanging data, 

- peripherals jointly usable by all processing unitsI/Oo, 

I/Oi, REGo/ REGi, CMEM for saving and/ or exchanging data, 

- the modules EQo, EQi assigned to the processing units 
35 whereby the modules EQo, EQi feature a unit to monitor 

transactions as well as a unit to stop the processing 
units until the current transaction has been reached by 



the processor units as well as a xinit Lo, Li to transfer 
parameters of the transactions to other modules. 

The modules EQo, EQi can, in this case, feature a unit to 
synchronize the processing units, in particular on the basis 
of the following transactions: 

- non-cacheable memory transactions relating to a local 
memory MEMo, MEMi assigned to the relevant processing 
units PROo, PROi and/or 

- input/output transactions for input /output modules l/Oo, 
I/Oi and/ or 

- memory-mapped input/output transactions for external reg- 

isters REGo/ REGi and /or 

- non-cacheable memory transactions relating to a common 
memory CMEM of processing units PROo/ PROi. 

In this case, the modules advantageously feature a unit to 
form the following parameters representative for transac- 
tions : 

- input/output addresses and/or 

- memory addresses and/ or 

- data to be transferred and/ or 

- type of tremsaction and/or 

- a signature formed from the input/output addresses and/or 
the memory addresses and/or the data to be transferred 
and/or the type of transaction. 

For handling external events, for example, interrupts, the 
processing units advantageously have the following features: 

- at least one execution unit EU, 

- at least one instruction coiinter element CIC to count the 

instructions executed by the execution unit since the 
last change into a special operating mode, 

- at least one register element MIR, for which the content 
can be specified by instructions or is able to be fixed, 

- at least one comparator element K to switch over the exe- 
cution unit EU into the special operating mode in re- 
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sponse to the instruction counter CIC matching the regis- 
ter element MIR, whereby in special operating mode the 
buffered external events to be routed to the processor 
modules which influence the processor modules are called 
5 by the processor modules. 

The buffered external events can be called advantageously in 
this case using software, firmware, microcode or hardware. 

10 One advantage that can be seen in this approach is that it 

allows the use of all kinds of new or existing software on a 
hardware fault-tolerant platform, whereby a processing unit 
which supports the invention can be used in this platform 
without there being a requirement for clock- synchronous, de- 

15 terministic operation of the CPU. 

Further benefits are: 

- The processing units which are redundant to one another, 

typically made up of a CPU, a Northbridge and local mem- 
20 ory, do not have to be operated coupled in strict phases - 

- The CPUs do not have to be identical, which especially 
allows the simultaneous use of different CPU steppings 
within a redundant system, and can be operated with dif- 
ferent clock frequencies. 

25 - The CPUs can behave in different ways with regard to 
speculative execution of instructions . 

- Different CPU-internal execution times of identical CPUs, 
e.g. as a result of corrections after the appearance of 
alpha particles which corrupt the data merely lead to the 

30 synchronization events being reached at slightly differ- 

ent points in time. 

The problems described in ensuring a clock-synchronous, de- 
terministic method of operation lead, as a result of the im- 
35 precise timing of future CPUs, to instruction execution that 
cannot be correlated exactly. Since the CPU for a typical ap- 
plication must react to external events, e.g. to an interrupt 
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generated by a peripheral or to data written by a device into 
memory, should be ensured that the CPU is made aware of these 
events at identical points in the instruction execution, 
since otherwise the evaluation of these events could lead to 
5 different program execution sequences of redundant CPUs. 

The present invention provides that external events relevant 
to the execution of the program, such as for example inter- 
rupts or data generated by external devices, is presented to 
10 redundant CPUs at identical points in the instruction execu- 
tion and thereby that the locks tep operating mode can be emu- 
lated. 



In addition, output events of redundant CPUs that are pre- 
15 sented at identical points in the instruction execution se- 
quence are compared and the results thus evaluated. By con- 
trast to the known methods which bring about the synchroniza- 
tion and distribution of data from the processor periphery by 
software-based methods, this method is implemented by the in- 
20 vention using hardware. Another advantage in this case is 

that the effect on performance is many times less than with 
software-based methods . In addition the described procedure 
is fully transparent for the application and operating system 
software, i.e. existing application and operating system 
25 software can continue to be used without modifications. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Exemplary embodiments of the invention are explained in more 
detail below in conjiinction with the Figures. 

30 

Figure 1 shows two processing units with assigned periphery 
and synchronization of transactions. 



35 



Figure 2 shows two processing units that are synchronized by 
two modules on the basis of their peripheral transac- 
tions. 
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Figure 3 shows the layout of a preferred processing unit with 
further details. 

Figure 4 shows a timing diagram of the instruction processing 
5 of two differently clocked processing units and their 

synchronization in accordance with the invention. 

DETAILED DESCRIPTION OF THE INVENTION 
In Figure 1, two processing units PROo, PROi are shown sche- 

10 matically for which the external transactions are synchro- 
nized. Transactions for the following components are shown as 
examples: local memory MEMq, MEMi, registers REGo, REGi and 
input/output modules or I/O modules 1/Oq, I/Oi. In this case, 
the first processing unit PROo is assigned the first compo- 

15 nents MEMo, REGo and I /Go whereas the second processing unit 
PROi is assigned the second components MEMi, REGi and I/Oi. As 
shown by the corresponding dashed line connections the proc- 
essing units have access to the registers REGn and the I/O 
modules I/Qm of the other processing unit in each case, 

20 whereas the assigned processing unit PROk has access to the 
local memory MEMjc. 

Furthermore, a typical component to which the processing 
units have common access is shown, here the common memory 
25 CMEM, whereby, by contrast to the registers and the I/O mod- 
ules, the common memoiry is not assigned to either of the 
processing units . 

Figure 2 shows two processing units and typically the I/O 
30 modules as well as the registers from Figure 1. These are not 
conventionally connected directly via the corresponding in- 
terfaces or interface modules, but by means of equalizer mod- 
ules EQo, EQi. Accesses by processing unit PROo are received 
by the equalizer EQo, processed and forwarded accordingly, 
35 likewise the processing unit PROo is presented with external 
data and events by the equalizer EQo. Similarly the process- 
ing unit PROi is assigned an equivalent equalizer EQi. 



The equalizers EQo, EQi exchange information and for this 
purpose advantageously feature a fast and direct connection 
Lo, Li. This connection can, as shown, be divided logically 
5 and/or physically into a first connection Lo:EQo -> EQi and a 
second connection LiiEQi -> EQo. 

As shown in Figure 2 by dashed lines, in accordance with the 
present invention, further units consisting of a processing 
10 unit PRO/ an equalizer EQ and peripherals REG, I/O can be 

connected to form a corresponding system with multiple redun- 
dancy. Adding a further unit of this type would produce a 
triple redundancy system in which error handling can already 
be performed by a two out of three multiple decision. 

15 

Figure 3 shows a more detailed implementation of the inven- 
tion in conjunction with a conventional processor/periphery 
architecture with the outstanding feature that a Central 
Processing Unit CPU is linked via a Northbridge interface 
20 unit NB to a Southbridge interface unit SB, whereby the 

Northbridge for example also includes the interface with the 
local memory MEMo while the Southbridge typically comprises 
an interrupt controller and other I/O fxinctions. 

25 As shown by way of example in Figure 3, a processing unit 
PROo can be constructed from a CPU, a Northbridge and local 
memory. The CPU in a particularly advantageous embodiment 
can, as well as the conventional units, of which for the sake 
of simplicity only a cache memory and an execution unit EU 

30 are shown, also include a register MIR, a counter CIC and a 
comparator K which serve to forward external events as inter- 
rupts and exceptions to the execution unit at specific points 
in the instruction execution sequence and to guarantee an 
otherwise uninterruptible processing sequence for the in- 

35 structions. 
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The equalizer module EQo in accordance with the invention is 
preferably located between the Northbridge and Southbridge 
since the interface between the Northbridge and Southbridge 
features all the necessary signal lines to allow the equal- 
5 izer to comply with the processing of the instruction se- 
quences until the synchronicity of the processing unit PROo 
with adjacent processing units (not shown) is achieved. The 
connections Lq, Li for connecting the equalizer EQo to equal- 
izers of adjacent processing units are only indicated. 

10 

The logical grouping shown in Figure 3 does not necessarily 
correspond to the physical grouping of the individual compo- 
nents. For example, the Northbridge can be integrated into 
the CPU or the equalizer can be integrated into the 
15 Northbridge or the Southbridge or be with the Northbridge in 
the CPU. 

Figure 4 is a graphic representation of the synchronization 
of the instruction execution of two processing units in the 

20 form of a timing diagram. In the example shown in Figure 4, 
identical instruction sequences are processed by two CPUs 
CPUo and CPUi, in which case CPUo is operated at a lower clock 
rate than CPUi. This means that CPUi reaches each instruction 
at an earlier point than CPUo, provided that at the begin- 

25 ning, i.e. on processing of mov rl, r2, the registers and the 
memory assigned to the CPUs were synchronized. 

This non- synchronous instruction processing is tolerable as 
long as the CPU does not interact with the outside world, for 

30 example by means of I/O modules or access to common memory. 

For transactions of this type, in the example of Figure 4 the 
reading out of I/O-register 0x87654321, it is however neces- 
sary for these transactions to occur simultaneously for both 
CPUs and especially with the same result. This is achieved 

35 using the equalizers, as described below. At the same time 
the equalizers ensure at such transaction points that the 
synchronicity of the CPUs is restored. 
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In line with the lockstep operating mode previously men- 
tioned, the inventive method is called emulated lockstep be- 
low. An implementation for the emulated lockstep consists of 
5 at least two processing xinits PROo and PROi which can com- 
prise a CPU, memory and memory control device (Northbridge of 
a standard chipset) . The construction of these processing 
units is identical but they can feature different CPUs or 
different steppings of a CPU and are started in an identical 
10 state, i.e. identical memory and CPU register contents. Link- 
age via common or synchronized clocks is not required in ac- 
cordance with the invention. 

As part of the machine code instruction sequence the CPUs 
initiate memory cycles, for example write cycles, read cycles 
and if necessary I/O cycles. The cycles that fulfill the fol- 
lowing conditions are suitable for synchronization of the 
CPUs: 

(a) they are instruction deterministic, i.e. they will be 
issued identically by all CPUs at the same point in the 
program and in the same sequence, and 

(b) they will be issued outwards by the CPUs, i.e. they are 
always visible and can be tapped outside the processors; 
processor-internal cache cycles are unsuitable for example 

The following memory cycles fulfill these general require- 
ments for example: 

- non-cacheable memory cycles in own memory MEMo, MEMi, 

- I/O cycles, 

- memory mapped I/O cycles, for example to external regis- 
ters REGo/ REGi, 

- non-cacheable memory cycles to a common memoary CMEM. 

Various external registers, e.g. timers, counters and/or an 
35 interrupt logic, as well as I/O units to the outside world, 
e.g. Ethernet controllers or SCSI controllers are generally 
in communication with the CPU. between CPU and I/O unit is an 
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equalizer is connected for each CPU via an asynchronous or 
synchronous interface which implements the emulated lockstep 
operation- Asynchronous or synchronous point-to-point connec- 
tions Lo, Li are required between the equalizers EQo, EQi to 
5 enable data, addresses or signatures to be exchanged. At the 
asynchronous interfaces a repeat password of data transmis- 
sion can be provided in the case of transmission errors. 

Read or write data access to I/O units or registers is vmder- 
10 taken as memory mapped I/O or direct I/O. The I/O xmits are 
visible and accessible via separate memory addresses- By con- 
trast the registers can be connected in a master-master or a 
master-slave configuration. With the master-master configura- 
tion the registers of the relevant assigned processing unit 
15 are accessed for reading or writing. A requirement for this 
mode of operation is that registers are in the same state 
when accessed by the processing units, in order to guarantee 
the parallel operating mode of the units. 

20 With the master-slave configuration the units exclusively 
read the registers of the master unit and the registers of 
the master unit are written to by the master unit. For exam- 
ple, to read out the current time from the units, the Time- 
of-Day counter (ToD) of the master \init is used to ensure 

25 that the units are supplied with exactly the same time when 
the ToD counter is read out, i.e. the registers assigned to 
one processing unit are accessed. Events such as interrupts 
that occur at other units are then transferred to the master 
unit. Write access into these registers take place on the 

30 units or be stored in memory in shadow registers to enable 

processing to continue with the new master unit in the event 
of an error. This can be controlled either using software or 
hardware . 

35 Individual transactions and the synchronization processes 
that take place on the basis of these transactions are de- 
scribed below in more detail. 
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Read transactions 

The Read instruction of a CPU of a processing unit PRO reads 
data from an I/O unit. A Read instruction of this type is il- 
5 lustrated in Figure 4, a typical such instruction is load 

rl, [0x87654321] . This instruction is generated by CPUs at the 
same point in instruction processing and is directed to a 
specific I/O unit, for example I/Oo, or a master register. 
The time of the Read instruction can however be different at 
10 the different CPUs. In Figure 4 the given Read instruction 
reaches CPUo later than CPUi. 

The I/O address or memory address generated by the CPU and 
the attributes of the transaction, e.g. Memory Read or I/O 

15 Read or data length or a signature generated from address and 
attributes is sent by the equalizer connected directly to the 
CPU to other equalizers. When the equalizer which is con- 
nected to the addressed I/O resource detects that the Read 
request has been generated by CPUs is the actual read access 

20 executed. With master-slave configurations the read data is 
distributed to equalizers which then complete the Read in- 
struction of the relevant connected CPU by forwarding the 
data to the CPU. The data can arrive at the CPUs at different 
times but this will not affect subsequent execution of the 

25 program. 

Should the I/O address or signature differ for an equalizer, 
read access is either not executed and an error interrupt is 
generated, for example a non-maskable interrupt NMI to the 
30 CPU, or a majority decision, e.g. 2 out of 3 is made if the 
configuration involved has 3 available CPUs. The faulty unit 
will be disconnected and diagnosed. 

To detect failures of individual units the read accesses are 
35 timed, i.e. the Read- instructions of CPUs are generated 

within a certain pre-specif ied time. If this period of time 
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between the instructions is exceeded a timeout will be gener- 
ated, the failed unit is disconnected and diagnosed. 

Read accesses are processed in the sequence in which they ar- 
5 rive. No provision is made for overtaking. 

Write transactions 

The Write instruction writes data into an I/O unit or a mem- 
ory unit. This instruction is generated by CPUs at the same 
10 point in instruction processing eind is directed to a specific 
I/O unit, for example I/Oq. The time of the Write instruction 
can however be different at the CPUs, 

The I/O address, the date and the signature typically gener- 
15 ated by the CPU and the attributes or a signature calculated 
from them is sent by the directly connected equalizer to 
other equalizers. When the write request has been generated 
by CPUs and was validated by the equalizer is the actual 
write access executed. 

20 

Should the I/O address, the date and/or the attributes or the 
signature differ for an equalizer, write access is either not 
executed and an error interrupt is generated, for example a 
non-maskable interrupt NMI to the CPU, or a majority deci- 
25 sion, e.g. 2 out of 3 is made, if the configuration involved 
has 3 available CPUs. The faulty unit will be disconnected 
and diagnosed. 

To detect failures of individual units the write accesses are 
30 timed, i.e. the Write instructions of CPUs are generated 

within a certain pre-specif ied time. If this period of time 
between the instructions is exceeded a timeout will be gener- 
ated, the failed unit is disconnected and diagnosed. 

35 Write accesses are processed in the sequence in which they 
arrive. No provision is made for overtaking. It is, however, 
possible for a number of write cycles to be created by the 
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CPU (known as posted Writes) . For the handling of these mul- 
tiple write transactions an appropriately dimensioned first- 
in-first-out memory (not shown) can be provided. 

5 Interrupts 

The external events influencing the program execution se- 
quence are not routed to the program directly, but first 
buffered by suitably designed hardware. In such cases this 
hardware can be a component of a module outside the CPU or a 

10 component of the CPU itself. The CPU includes a counter CIC 

(Completed Instruction Counter) which counts the machine code 
instructions which have been completed by the CPU. The CPU 
also includes a register MIR (Maximum Instruction Register) , 
into which data is written by software supporting the emu- 

15 lated lockstep operation (ELSO) . 

The CPU further features a comparator K which compares the 
number of executed instructions, that is counter CIC, with 
register MIR and, if they are the same, typically generates 

20 an interrupt request which interrupts instruction execution 
after the nxamber of instructions specified by register MIR 
and switches the CPU into another operating mode. In this op- 
erating mode suitable microcode is executed for example or a 
branch is made to an Interrupt Service Routine or hardware 

25 signals are used to show that this synchronization point has 
been reached. In this operating mode, the external events are 
then presented to the redundant CPUs such that, after leaving 
this operating mode, CPUs can evaluate these events in the 
same way and thereby execute the same instructions as a re- 

30 suit. 

Typically the CPU branches after reaching the number of ma- 
chine instructions specified in register MIR to an Interrupt 
Service Routine, in which the state of the interrupt signals 
35 kept away by the described hardware of the CPU is interro- 
gated so that a rediindant CPU which if necessary makes this 
request at a slightly later point in time, receives identical 
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information. This interrogation is, for example, a read data 
access to an interrupt register. This read access is handled 
as described above, which ensures that CPUs read the same in- 
terrupt vector and initiate the same actions . 

5 

Before the special operating mode is quit, counter CIC is re- 
set. Subsequently, a branch is made back to the point in the 
program at which the interrupt occurred as a result of the 
counter value CIC specified in register MIR. The CPU will 
10 then again execute the number of machine instructions speci- 
fied by register MIR and when counter CIC reaches register 
value MIR, will change the mode and thereby allow external 
events to be accepted. 

15 For example, ELSO software that supports the emulated lock- 
step operation can set register MIR to a value of 10,000. A 
CPU which is operated with a 5 GHZ clock frequency and on av- 
erage executes one machine instruction per clock (length of a 
clock: 1/200 ps) would thus be interrupted in its instruction 

20 processing after 2 fxs and allow synchronization with external 
events . 

Direct Memory Access DMA 

With a DMA transaction (Direct Memory Access) an I/O unit can 
25 have direct read and write access to memory. The timing rela- 
tionship of an access of the I/O unit and CPU is not given. 
Were the CPU to access the same memory area during a DMA 
transfer, processing units could lose the pseudo- synchronous 
mode of operation since the main memory of the processing 
30 units is no longer necessarily identical at the time of ac- 
cess . 

For a DMA transaction it must therefore be ensured that a no- 
tification is sent to the CPU which arrives at CPUs at the 
35 same point in the instruction execution. A number of solu- 
tions are demonstrated for this below. 
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For example, the notification can be sent by the I/O unit 
generating an interrupt after completion of the DMA 
transfer which tells the CPU that the transfer is com- 
pleted and that the transferred memory area is released 
again. As a result of the interrupt the interrupt status 
of the source, that is of the I/O unit is read. This 
reading via the I/O bus of both units, e.g. the PCI bus, 
forces a serialization of the transactions, so that a 
guaranteed sequence of data generated by the I/O units is 
located in the memory of processing units. 

In another embodiment, when jobs generated by the CPU of 
a processing unit are transferred to the I/O units by the 
CPU, an entry can be made in the register which initiates 
a DMA transfer. Alternatively, scripts or lists which are 
used simultaneously by both the CPU and the I/O unit can 
be located at the I/O unit as local memory. A possible 
data access from the CPU then takes the form of a memory 
mapped Read or Write instruction and it is ensured that 
CPUs operate with the same data. 

In the other direction, when a descriptor generated by 
the I/O unit or the I/O units for the job for the CPU is 
to be in the memory of a processing unit PRO and is read 
out from the CPUs with a polling procedure, the CPUs read 
what is known as an I/O lockout register. Subsequently, 
at least no write transactions of the I/O units are sent 
by the equalizer into the local memory of the processing 
units PRO and the last write transactions sent by the I/O 
\inits are written by the equalizer into the local memory 
of the processing units. The verb frequently used for 
this process is "to flush". It ensures the same content 
in the memories of the processing units in relation to 
write transactions generated by I/O units. Subsequently, 
the memory location in the memory of CPUs is read for 
which the value shows the completion of an I/O job, for 
example. Thereafter, the I/O lockout register is read or 
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written again and an I/O-free register is read or written 
to allow write access to the main memory by the I/O units 
once again. 

5 - In a further embodiment/ the following method can be used 
when the descriptor of the job generated by the CPU or 
the CPUs for the I/O units is to be in the memory of the 
PRO and is to be read out using a polling procedure: The 
CPUs read what is known as an I/O lockout register- Sub- 

10 sequently, at least no more read transactions of the I/O 

vmits are sent to the memory of the processing unit. Sub- 
sequently, a value is written into the memory location in 
the main memory of CPUs which represents a trigger for an 
I/O-MISSION. The I/O lockout register is then read or 

15 written again or an I/O-free register is read or written 

to enable read access to the memory by the I/O units 
again . 

Data comparison 

20 Data which is read by I/O units from the memory is read by 

equalizers from the memory of the connected processing units, 
completely or as a signature, and sent to the equalizers con- 
nected to the requesting I/O unit and compared by the latter. 
Alternatively, the other equalizers can also perform a com- 

25 parison. If it is the same, the data is forwarded to the I/O 
unit. If a difference is detected, a majority choice is made 
if necessary, e.g. 2 out of 3, and the faulty unit is discon- 
nected and diagnosed. 

30 Data which is generated by the CPUs of the processing units 
is sent completely or as a signature to the equalizer con- 
nected to the destination I/O unit and compared by the lat- 
ter. Alternatively, the other units can also perform a com- 
parison. If it is the same, the data is forwarded to the I/O 

35 unit- If a difference is detected, a majority choice is made 
if necessary, e.g. 2 out of 3, and the faulty unit is discon- 
nected and diagnosed- 
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Read requests generated by the CPUs of the processing unit, 
characterized for example by the read command, addresses and 
attributes are sent completely or as a signature to the 
equalizer connected to the source and compared by the latter. 
Alternatively, the other xinits can also perform a comparison. 
If the data is the same, the read transactions are executed 
and the data read is sent to the equalizers. If a difference 
is detected, a majority choice is made if necessary, e.g. 2 
out of 3, and the faulty unit is disconnected and diagnosed. 

With emulated lockstep read and write transactions of the CPU 
are not compared as regards their local memory MEM since this 
can be completely different, e.g. as a result of different 
speculative accesses of the CPUs or different cache behavior. 
To check the contents of memory areas of the different proc- 
essing units PRO for equivalence a check must be made for ex- 
ample by a routining software at a point at which one can be 
sure that the memory contents in the faulty state are consis- 
tent and remain consistent for the duration of the checking. 
Memory checking itself can be undertaken by software, i.e. 
the software/CPU reads a memory area for example, generates a 
checksum and compares the checksums determined by the differ- 
ent processing units. Memory checks can, however, also be un- 
dertaken by hardware with facilities located in the equaliz- 
ers reading the memory of the connected processing units, 
forming a checksum and making a comparison between them- 
selves . 

Multiprocessor architecture with shared memory 

The emulated lockstep operation is also suitable for synchro- 
nizing memory access by a number of processing units to a 
shared memory CMEM and comparing that data as described 
above, provided that the transactions satisfy the general 
conditions explained previously; that is non-cacheable memory 
transactions for example. 
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Thus it is possible in a further embodiment to define multi- 
processor configurations that consist of a number of proces- 
sors (with local memory) which can access a common memory 
CMEM. In this case, each processor unit is duplicated for 
reasons of redundancy and for error detection, i.e. a proces- 
sor unit includes two identical processing units PRO (not 
shown) which perform tasks in parallel and among other things 
synchronize for access to common memory and in doing so per- 
form a data comparison. 
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